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Abstract 

This paper deals with the nonparametric estimation in heterosce- 
dastic regression Yi = f{Xi) + ^j, i = l,...,n, with incomplete in- 
formation, i.e. each real random variable has a density gi which 
is unknown to the statistician. The aim is to estimate the regres- 
sion function / at a given point. Using a local polynomial fitting 
from M-estimator denoted and applying Lcpski's procedure for the 
bandwidth selection, we construct an estimator which is adap- 
tive over the collection of isotropic Holder classes. In particular, we 
establish new exponential inequalities to control deviations of local 
M-estimators allowing to construct the minimax estimator. The ad- 
vantage of this estimator is that it does not depend on densities of 
random errors and we only assume that the probability density func- 
tions are symmetric and monotonically on M_|_. It is important to 
mention that our estimator is robust compared to extreme values of 
the noise. 
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1 Introduction 



Let the statistical experiment be generated by the observation 
^■("^ = (Xj, n G N*, where each (Xi,Yi) satisfies the equation 

F, = /(X,)+ei, ^ = l,...,n. (1.1) 

Here / : [0, 1]"^ — )■ M is an unknown function to be estimated at a given point 
Xq G [0, 1]*^ from the observation Z^'^\ 

The real random variables (^i)iei,...,n (the noise) are supposed to be inde- 
pendent and each variable C,i has a symmetric density gi{-), with respect to 
the Lebesgue measure on M. We also assumed that Qi is monotonically on 
]R_|. for any i. 

The design points (Xj)jgi_ „ are independent and uniformly distributed 
on [0, l]'^. The random vectors (Xj)jgi^...^„ and (^i)jei,. -,n independent. 

Along the paper, the unknown function / is supposed to be smooth, in 
particular, it belongs to an isotropic Holder ball of functions M.d{P, L, M) 
(cf. Definition 1 below). Here /3 > is the smoothness of /, L > is the 
Lipschitz constant and M is an upper bound of / and its partial derivatives. 



Motivation. In this paper, the considered problem is the robust nonpara- 
metric estimation, i.e. the estimation of the regression function / in the 
presence of a heavy-tailed noise (cf. Rousseeuw and Leroy [1987] and Huber 
and Ronchetti [2009]). Well-known examples are when the noise distribution 
is for instance Laplace (no finite exponential's moment) or Cauchy (no finite 
order's moments). Moreover, we assume that the noise densities {gi)._^ ^ 
are unknown to the statistician. This problem has popular applications, for 
example in relative GPS positioning (cf. Chang and Guo [2005]) or in robust 
image denoising (cf. Astola, Egiazarian, Foi, and Katkovnik [2010]). 

In parametric case, we consider / as a constant parameter ^ G M. The 
use of empiric criteria is very popular, i.e. the minimization of the following 
contrast function p: 

n 

9 = arg min p(Yi — t), 

i=l 

The most famous contrast functions are the square function p{z) = {9 
become the empiric mean), the absolute value function p{z) = \z\ {9 become 
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the empiric mean) and the Huber function, as defined in (5.3), without an 
exphcit expression of 6 (cf. Huber [1964]). It is well known that the square 
function leads to the empiric mean which does not fit with a heavy-tailed 
noise. Thus the square function is not suitable in the model (1.1). 

In nonparametric estimation, we propose a local parametric approach 
(LPA) to estimate the regression function at a given point Xq G [0, 1]'' in 
the model (1.1). We suppose that / is locally almost polynomial (with de- 
gree 6 e N) and we use the parametric estimator on a neighborhood denoted 
VxQ{h) The parameter is reconstructed from the following criterion, for any 
xo G [0, 1]'=' and h G [0, 1] 

9 = ^rg min J] p(Y, - f,(X,)) K (^^^) . (1.2) 

where ft{-) is a polynomial of degree b with coefficients t, K{-) is a kernel 
function and Nf, is the number of partial derivatives of / of order smaller 
than b. We refer to f^{xo) = f§{xo) as the p-LPA estimator. It belongs to 
the family of M-estimators and it relies on a local scale parameter h, called 
the bandwidth. A crucial issue is the optimal choice of the parameter h. 
To adress it we use quite standard arguments based on the bias /variance 
trade-off {ci. (1.5) below) in minimax case and the Lepski's rule for the data- 
driven selection in adaptation. First, since / is smooth (/ G M.d{P, L, M), cf. 
Definition 1 below) we notice that 

3e = e{f,Xo,h) eQ{M) : bh:= sup |/(x) - /,(x) | < Ld/i^. (1.3) 

We can choose 9 as the coefficients of Taylor polynomial as defined in (2.6). 
Thus, if h is chosen sufficiently small our original model (1.1) is well approx- 
imated inside of V^o(/i) by the "parametric" model 

= fe{Xi) + e., V« : X, G V.x,{h). (1.4) 

With this model, the p-LPA estimator 9 achieves the usual parametric rate 
of convergence 1 / y/nh^, where nh'^ is the number of the observations in the 
neighborhood V^o(^) (^^^ Theorem 1, Section 3). 

This approach has been introduced by Katkovnik [1985] and used for the 
first time in robust nonparametric estimation by Tsybakov [1986], Hardle and 
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Tsybakov [1988] and Hall and Jones [1990] to obtain asymptotic normality 
and minimax results. We also notice that Tsybakov[1982a,1982b,1983] ob- 
tained similar results to estimate the locally almost constant functions. 

Minimax Estimation. To guarantee good performance of the p-LPA es- 
timator in the minimax sense, we assume that p' is bounded and Lipschitz. 
On the other hand, the Huber function satisfies these assumptions, making it 
suitable for our problem. Moreover, it is commonly used in practice (see for 
instance Petrus [1999] and Chang and Guo [2005]). As for linear estimators 
(kernel estimators, least square estimators, etc.), a good choice of the band- 
width h = hn{/3,L) provides an optimal p-LPA estimator over the Holder 
space M.d{(3, L, M). Finally, hn{f3,L) = (L^n)~2/3+d is chosen as the solution 
of the following bias/ variance trade-off 

{nh'^y^^^ + Lhl^ ^ mm . (1.5) 

In the model (1.1), we show that the corresponding estimator f^"^^'^\xo) 
achieves the rate of convergence n~^^^'^^~^'^^ (cf. Definition 2) for /(xq) on 
M.d{f3, L, M) (See Theorem 1). We should point out that both the knowl- 
edge of (3 and L is required to the statistician in order to built the optimal 
bandwidth hn{/3,L). 

Adaptive Estimation. In nonparametric statistics, an important prob- 
lem is the adaptation compared to the smoothness parameters /3 and L that 
are unknown in practice. This requests to develop a data-driven (adaptive) 
selection to choose the bandwidth. Then, the interesting feature is the se- 
lection of estimators from a given family (cf. Barron, Birge, and Massart 
[1999], Lepski, Mammen, and Spokoiny [1997], Goldenshluger and Lepski 
[2008]). In this context, several approaches to the selection from the family 
of linear estimators were recently proposed, see for instance Goldenshluger 
and Lepski [2008], Goldenshluger and Lepski [2009], Juditsky, Lepski, and 
Tsybakov [2009] and the references therein. However, those methods strongly 
rely on the linearity property. Robust estimators are generally non-linear, 
there standard arguments (like the bias/variance trade-off) cannot be ap- 
plied straightforwardly. For instance. Brown, Cai, and Zhou [2008] use the 
asymptotic normality of the median to approximate the model (1.1) by the 



4 



wavelet sequence data and they use BlockJS wavelet thresholding for adapta- 
tion over Besov spaces with the integrated risk. Recently, Reiss, Rozenholc, 
and Cuenod [2011] have considered the pointwise estimation for locally al- 
most constant functions in the homoscedastic regression with a heavy-tailed 
noise. That corresponds to /3 < 1 for the Holder functions in the model (1.1) 
(cf. also Definition 1). They have considered the symmetric and continuous 
density with g{Q) > 0. 

In the context of adaptation, other new points are developed in this paper: 

- adaptative pointwise estimation for any regularity /3 of isotropic func- 
tions, 

- random design and heteroscedastic model, 

- unknown and heavy-tailed noise. 

For it, we construct an adaptive estimator (cf. Definition 3) using general 
adaptation scheme due to Lepski [1990] {Lepski's method). This method is 
applied to choose the bandwidth of the p-LPA estimator in the model (1.1). 

We remind that M, the upper bound of / and its partial derivatives, is 
involved in the construction of the p-LPA estimator (1.2). Then, we assume 
that the parameter M is known and we do not study the adaptation compared 
to it. Contrary to the constants /3, L, one could estimate M to "inject" it in 
the procedure without loss of generality in the performance of our estimator 
(cf. Hardle and Tsybakov [1988]). 

Exponential Inequality. Lepski's procedure requires, in particular to es- 
tablish the exponential inequality for the deviations of p-LPA estimator. As 
far as we know, these results seems to be new. 

Denote by P/ the probability law of the observations Z^"^^ satisfying (1.1). 
As we mentioned above, we need to establish the following inequality, for any 
6 > and h e (n"^/"', 1): 

P/f|/'(^o)-/(xo)| <Cexp\- f,r^ \, (1-6) 

where C, A, B are positive constants and A, B must be "known". Details 
are given in Proposition 1. All results of this paper are based on (1.6). 

The main difficulty in establishing (1.6) is that the explicit expression of 
p-LPA estimator is not typically available. Let us briefly discuss the main 
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ingredients of M-estimation allowing to prove (1.6). If the derivative of con- 
trast function p' is continuous, then solving the minimization problem (1.2) 
can be viewed as solving the following system of equations in t (first order 
condition): 



Vp, Dl{t):= Yl ^P{y^-MX^))K^^' 



Xo 



h 



0, 



:i.7) 



where tp is the p component of the vector t. Since p' is bounded, the 
partial derivatives Df^{■) can be viewed as an empirical process (i.e. a sum 
of independent and bounded random variables). 

Denote Dh{-) the vector of partial derivatives and Dh{-) = E,fgDh{-) where 
= is the mathematical expectation with respect to the probability 
law P/g of the "parametric" observations (Xj, 3^j)j=i^...^„. 



Solution 
Dh=0 



e 



n 



Y 



Dh{t) 



Solution 



e 



Figure 1: Illustration of the deviations' control. Vn represent the probability 
convergence. 

Properties of the function Dh{-) allow us to prove that 9 is the unique 
solution of -Dh(-) = 0. We also notice that \f^{xQ) — /(xq)] < ||^ — 6^11 1. 
The idea (presented in Figure 1) is to deduce the exponential inequality for 
11^ — ^11 1 from the exponential inequality for sup^ \ Dh{t) — Dh{t)\. As we 
mentioned above, we notice that sup^ \Dh{t) — Dh{t)\ can be viewed as the 
supremum of an empirical process. 

Now, classical arguments in probability tools can be used. To control 
supt \Dh{t) — Dh{t)\, we could used standard tools developed by Talagrand 
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[1996a, 1996b], Massart [2000] or Bousquet [2002]. But the obtained ex- 
ponential inequalities (like (1.6)) contain unknown constants or require the 
knowledge of an expectation's bound of sup^ \Dh{t) — Dh{t)\. To obtain this 
bound, we can use the maximal inequalities developed by Van der Vaart 
and Wellner [1996] (Chapter 2, Section 2.2) for sub-gaussian processes. But 
here again, there are universal constants (and unknown) in the bound of the 
expectation. Massart [2007] (Chapter 6) gives exponential inequahties for 
sup^ \Dhit) — Dh{t)\ without the expectation, but some constants are very 
big in our case. In this paper, we choose to apply standard chaining argu- 
ment and Bernstein's inequality (cf. (7.8)) directly on sup^ \Dh{t) — Dh{t)\ 
(cf. Proof of Lemma 3). That allows us to have constants smaller the ones 
cited in the papers above. 

Perspectives. 

- We think that conditions on the noise densities could be reduced. We 
could consider the densities not necessary monotonically on IR+, only 
the symmetric assumption seems necessary. 

- A possible perspective of this work is the study of estimating anisotropic 
functions. Indeed, the method developed by Kerkyacharian, Lepski, 
and Picard [2001], Klutchnikoff [2005] and Goldenshluger and Lepski 
[2008, 2009] are based on the linear properties and the machinery con- 
sidered in those works can not adapt straightforwardly to nonlinear 
estimators. 

- Another perspective is to prove an oracle inequality for the family of 
p-LPA estimators indexed by the bandwidth with the integrated risk. 
It could be interesting to introduce some criterion for choosing the 
optimal contrast function. 

- Finally, we should also study the heteroscedastic model (1.1) with a 
degenerate design when the design density is vanishing or exploding. 

This paper is organized as follows. We present exponential inequalities in 
Section 2, in order to control deviations of p-LPA estimator. In Section 3, we 
present the results concerning minimax estimation and Section 4 is devoted 
to the adaptive estimation. An application of p-LPA estimator with Ruber 
function is proposed in Section 5. The proofs of the main results (exponential 
inequalities and upper bounds) are given in Section 6, technical lemmas are 
postponed to the appendix. 
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2 Exponential inequality for p-LPA estimator 

Construction. To construct our estimator, we use tlie so-called local poly- 
nomial approach (LPA) which consists in the following. Let 

d 

Vxoih) = (g) [Vj - h/2,y, + h/2] n [0, 1]^ 

be a neighborhood around Xq of width h G (0, 1). Fix 6 > (without loss of 
generality we will assume that b is an integer), let 

Sb^{p^ {pi, ■ ■ ■ ,Pd) e N'^ : < IpI < 6, IpI = pi + . . . +pd}, 

and we denote Ni, the cardinal of Sb- Let U{z),z eM.^ be the A^6-dimensional 
vector of monomials of the following type (the sign T below denotes the 
transposition) : 

u^{^)=(ti^-\ pes^y (2.1) 

For any t'^ — {tp^^,„^p^ G R : p G Si,) G R^*", we define the local polynomial 
in a neighborhood of xo as for any x G [0, 1]'' 

where = z^^ ■ ■ ■ z^/ for ^ = (2:1, ... , Zd) and I denotes the indicator function. 
For any M > 0, introduce the following subset of . 



e(M) = {t G R^" : ||t||i<M}, (2.3) 

where || ■ ||i is £i-norm on M^'. We notice that for any t G 6(M), ||/t||oo < ^ 
where || ■ ||oo is the sup-norm on [0, 1]"^. 

The function p is called contrast function if it has the following properties. 

Assumption 1. 

1. p : R — >■ R+ is symmetric, convex and p(0) = 0, 

2. the derivative p' is 1-Lipschitz on R and bounded: poo — \\p'\\oo < 00, 
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3. the second derivative p" is defined almost everywhere and there exist 
Lp > and a > such that 

sup / \p"{z + u) — p"{z + v)\ gi{z) dz < Lp\u — vl"", Vu, f G M. 
i=T^ Jr 

where 1, n = 1, . . . , ra. 

A well-known example of a contrast function p satisfying Assumption 1 
above is the Huber function (cf. Huber [1964]) presented in Section 5. 

Let Khe a. kernel function, i.e. a positive function with a compact support 
included in [-1/2,1/2]^ such that := \\K\\^ < oo and J^K{z)dz = 1. 
We will construct the p-LPA estimator for f{xo) using local p-criterion which 
is defined as follows: 

n,it) = n,{t, ZH) = ^ E - MX,)) K i"^) . (2.4) 

i=l ^ ^ 

Let 9{h) be the solution of the following minimization problem: 

9{h) = axg m\n^TXh{t). (2.5) 

The p-LPA estimator f^{xo) of /(a;o) is defined as f^{xo) = 6^0,. ..,o(^)- We 
notice that this local approach can be considered as the estimation for suc- 
cessive derivatives of the function /. However in the present paper, we focus 
on the estimation of f{xo). 



Exponential inequality. Later on, we will only consider values of /i G 
Put 6 = e{f,xo,h) = {Op-, pe Sh}, where Oo = ^o,...,o = /(^o) 

and 

Here, we do not assume the existence of partial derivatives of /. To define 9 
properly, the following agreement will be used in the sequel: if the function 
/ and the vector p are such that c?'^'/ does not exist, we put 9p = 0. 

Set B{9, z) = {te e(M) : \\t - 9\\2 < z) the Euchdean ball with radius 
z and center 9 and define the event for any /i, z > 

G\ = {9{h)eB{9,z)}, (2.7) 
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where 6{h) is given by (2.5). Let 

be some finite constants and let the constant A be the smallest eigenvalue of 
matrix 

U{x) U^{x) K{x) dx. 



'hl/2,l/2]<* 

Tsybakov [2008] (Lemma 1.4) showed that A is positive, on the hand the last 
matrix is strictly positive definite. Finally, put 

c(p, (g,),) = inf / p"iz) g,iz) dz, (2.9) 

and define the set of sequences of symmetric densities which are monotoni- 
cally on IR+ 

Gl'^ = {i9zh ■■ c(p, ig^)i) > c} , c> 0. (2.10) 

Denote for all a, 6 G M, a V 6 = max(a,6). The next proposition is the 
milestone for all results proved in the paper. 



Proposition 1. Let p be a contrast function and let c > 0. Then, for any 
n e N*, {gi)i G h > nT^I'^, M > 0, Xq G [0,1]^^ and any f such that 
\\9{f, xo, h)\\i< M, we have for any e>^{iy hh v^) 

P/ (^'\f\x,)-f{x,)\>e, Cf 



< iVftrexp < -rj. ^ — ^ > . (2.11) 

8i^^(lVp^) + ^(lVpo.)^ ^ ^ ^ 

The proof of this proposition is given in Section 6. 

Remark 1. The control of the deviations of f^ is realized under the event 
that the estimator 0{h) is contained in a hall centered at 6 whereas its 
radius does not depend on n, else it could change the rate of convergence. In 
Section 6 we give an exponential inequality to control the probability of the 
complementary of G'^ (cf. Lemma 4)- 
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Remark 2. In the minimax case, the knowledge of constants in (2.11) is 
not required. However for adaptation, the constant c is involved in the con- 
struction of adaptive estimator. This restricted the consideration of the noise 
densities which satisfy (2.10). We notice that this problem is simplified to 
the calibration of an alone constant with a dataset. 



3 Minimax Results on EIrf(^, L, M) 

In this section, we present several results concerning maximal and mini- 
max risks on ]HIrf(/3, L, M). We propose the estimator which bound the max- 
imal risk on this class of functions without restriction imposed on these pa- 
rameters. 



Preliminaries. 

Definition 1. Fix /3>0,L>0,M>0 and let [/3J be the largest integer 
strictly smaller than (3. The isotropic Holder class Md{f3, L, M) is the set of 
functions f : [0, 1]*^ — )■ M admitting on [0, 1]'^ all partial derivatives of order 
[/3J and such that for any x, y G [0, l]*^ 



d\P\f{x) 



d\p\f{y) 



dx\ 



dx^/ 



dyT ■ ■ ■ dyf 



sup 



d\p\f( 



X] 



dxi ■ ■ ■ dxl 



< L 



< M. 



X 



ylli] 



where Xj and yj are the j*^ components of x and y. 

Let Kf = KJ he the mathematical expectation with respect to the prob- 
ability law P/ of the observation Z^"'^ satisfying (1.1). Firstly, we define the 
maximal risk on ]HIrf(/3, L, M) corresponding to the estimation of the function 
/ at a given point Xq G [0, l]''. 

Let fn be an arbitrary estimator built from the observation Z^'^\ For any 
r let 

sup Ef\fn{xo) - f{xo)\\ (3.1) 

/eHd(/3,L,M) 
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This quantity is called maximal risk of the estimator /„ on ]HI^(/3, L, M) and 
the minimax risk on M.d{P, L, M) is defined as 

Rn,r [^dW, L, M)] = inf [/, H,(/3, L, M)] , (3.2) 

where the infimum is taken over the set of all estimators. 

Definition 2. The normalizing sequence ipn is called minimax rate of con- 
vergence and the estimator f is called minimax (asymptotically minimax) 

limMtp-' Rr^^f MP, L,M)] > 0, (3.3) 
limsup^-'^i?„,,[/,M,(/3,L,M)] < oo. (3.4) 

n—^oo 

Upper bound for maximal risk. Let the minimizer of the bias/variance 
trade-off (1.5) be given by 

h = {L^n)~W+d. (3.5) 

The next theorem shows how to construct the estimator based on locally 
parametric approach which achieves the following rate of convergence in the 
model (1.1) 

¥P„(/3) =n-WT3. (3.6) 
Let [^{xq) = §0 o{h) be given by (2.3), (2.4) and (2.5) with h = h and 

Theorem 1. Let /S > 0, L > 0, M > 0, xq G [0, l]'^, c> and let p be a 
fixed contrast function. Then for any {gi)i G Qp'^^ 



limsup ^;:{l3)Rn,r f\xo),MdiP,L,M) 



< oo, Vr > 1. 



This theorem will be deduced from Proposition 1 and the proof is given 
in Section 6.2. 

Remark 3. Tsybakov [1982a] showed lower hounds (3.3) for rate n 2/3+d with 
the following assumption on Kullback distance on the noise density g, i.e. it 
exists f > such that 

giu) In ^ du < o(v'^) , Vf : If I < Vq. 
g[u + v) ' 
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We notice that Gaussian and Cauchy densities verify this assumption (of. 
also Tsybakov [2008] Chapter 2). In this case, we conclude that is minimax 
and (fniP) is the minimax rate on M.d{P, L, M). 

4 Bandwidth Selection of p-LPA Estimator 

This section is devoted to the adaptive estimation over the collection 
of classes < IHI(i(/3, L, M) > . Here we suppose M known, as we mentioned 



in the introduction, the parameter M could be estimated and used with a 
"Plug-in" method (cf. Hardle and Tsybakov [1992]). We will not impose 
any restriction on the possible value of L, but we will assume that (5 G (0, 6], 
where h as previously, is an arbitrary chosen integer. 

We start by remarking that there is not optimally adaptive estimator. 
Well-known disadvantage of maximal approach is the dependence of the esti- 
mator on the smoothness parameters describing the functional class on which 
the maximal risk is determined (cf. (3.1)). In particular, /i„(/3, L), optimally 
chosen in view of (1.5), depends explicitly on /3 and L. To overcome this 
drawback, a maximal adaptive approach has been proposed by Lepski [1990] 
for pointwise estimation. The first question arising in the adaptation (re- 
duced to the problem at hand) can be formulated as follows. 

Does there exist an estimator which would he minimax on ]HI(/3, L, M) 
simultaneously for all values of /3 and L belonging to some given set !B C 



For integrated risks, the answer is positive (cf. Lepski [1991], Donoho, 
Johnstone, Kerkyacharian, and Picard [1995], Lepski and Spokoiny [1997], 
Goldenshluger and Nemirovski [1997] and Juditsky [1997]). For the estima- 
tion of the function at a given point, it is typical that the price to pay is not 
null (cf. Lepski [1990], Brown and Low [1996], Lepski and Spokoiny [1997], 
Tsybakov [1998], Klutchnikoff [2005], Reiss, Rozenholc, and Cuenod [2011], 
Chichignoud [2011]). Mostly, the price to pay is a power of [b — /3) Inn for 
pointwise estimation. 

Let = {V'n(/9)}/3g(o b] be a given family of normalizations. 

Definition 3. The family is called admissible if there exists an estimator 
fn such that for some L,M > 




[M+ \ 0] X [M+ \ 0] ? 



limsup^-'-(/3) Rn,r{fnMP,L,M)) < oo, V/3 G (0,6]. 



(4.1) 
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The estimator fn satisfying (4-1) is called -attainable. The estimator fn is 
called ^-adaptive if (4-1) holds for any L > 0. 



Lepski [1990] showed that the family of rates {v9„(/3)}^g(^p ^j, defined in 
(3.6), is not admissible in the white noise model. With other tools, Brown 
and Low [1996] extend this result for density estimation and nonparametric 
Gaussian regression. It means that there is no-estimator which would be 
minimax simultaneously for several values of parameter /3, for pointwise es- 
timation, even if L is supposed to be fixed. This result does not require any 
restriction on /3 as well. 

Now, we need to find another family of normalizations for maximal risk 
which would be attainable and, moreover, optimal in view of some criterion of 
optimality. Let $ be the following family of normalizations, for any /3 G (0, b] 



We notice that 0„(6) = (pn{b) and for n large enough gn{f3) ~ (6 — /3) Inn for 
any P ^ b. It is possible to show that this family $ is adaptive optimal using 
the most recent criterion developed by Klutchnikoff [2005] used for the white 
noise model and used by Chichignoud [2011] for the multiplicative uniform 
regression. On the other hand, the so-called price to pay for adaptation Qn{P) 
could be considered as optimal. 

Construction of ^-adaptive estimator. We begin by stating that the 
construction of our estimation procedure is decomposed in several steps. 
First, we determine the family of p-LPA estimators. Next, based on Lep- 
ski's method, we propose a data-driven selection from this family. 

Let p be a fixed contrast function. In the model (1.1), we recall that the 
sequence of densities {gi)i is "unknown" for the statistician. We take f^ the 
estimator given by (2.3), (2.4) and (2.5), so the family of p-LPA estimators 
jF is defined now as follows. Put 




Inn. 



(4.2) 



h- 



mm 



(lnn)2/'^n^i/^ h 



max 



= n 2''+'' , 



(4.3) 



and 



hk = 2 



A; = 0, k, 



-n 



0,...,k, 
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where k„ is the largest integer such that /ik„ > /imin- Set 



(4.4) 



We put f*{xo) = f^''\xo), where /'•^•'(xo) is selected from ^ in accordance 
with the rule: 

k = mi[k = OX^: \f'''\xo)-p\xo)\<CSn{l), / = A; + l,k„}. (4.5) 



Here we have used the following notations. Let c > be fixed and 



C = ^(l + 2iroo(lVpoo)v^) 
1 4 

Sn{l) = 



cX 
1 + / In 2 



(4.6) 



n 



1/2 



/ — 0, kn, 



where r > 1 is the power of the risk and c is defined in (2.9), poo and K^o 
are respectively bounds of p'(-) and K{-), and the positive constant A is the 
smallest eigenvalue of the matrix Jj_q 5 q sjd (x) U~^{x) K{x) dx. We will see 
that this matrix is strictly positive definite (cf. Lemma 1). 



Main Result. The next theorem is the main result of this paper. It allows 
us to guarantee a good performance of our adaptive p-LPA estimator /*. 

Theorem 2. Let 6 > 0, M > and p be a fixed contrast function. Then, for 
any 

{9i)i e Q?, /3 e (0, 6], L > anc? r > 1 



limsup07(/3) Rn,r r{xo)MP,L,M) 



< 00. 



The proof (given in Section 6.3) is based on the scheme due to Lepski, 
Mammen, and Spokoiny [1997]. 

Remark 4. The assertion of the theorem means that the proposed estimator 
f*{xo) is ^-adaptive in the model (1-1) (cf. Definition 3). It implies in 
particular that the family of normalizations $ is admissible. 
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Remark 5. In the present paper, we do not give the explicit expression of 
the constant in the upper hound of the risk with the proof given in this paper. 
But it is possible to solve this problem. In the proof of Lepski's method, we 
notice that the upper bound polynomially depends on the parameter C and it 
is important to minimize this constant. We see that this constant depends on 
the contrast function p and it is easy to see that minimizing C = C{p) can be 
viewed as minimizing the following Huber variance ( cf. Ruber and Ronchetti 
[2009] Page 74) 

' {Ip"dgY ^ 
where g is the noise density in the homeoscedastic model. 

Remark 6. The limitation concerning the consideration of isotropic classes 
of functions is due to the use of Lepski's procedure. It seems that to be able to 
treat the adaptation over the scale of anisotropic classes (i.e. d-dimensional 
functions with different regularities /3 for each variable). Another scheme 
should be applied as in Lepski and Levit [1999], Kerkyacharian, Lepski, and 
Picard [2001], Klutchnikoff [2005] and Goldenshluger and Lepski [2008]. As 
we have mentioned above, these latter procedures cannot be used with p-LPA 
estimators, and for the model (1-1) this problem is still open. 

5 Application: Huber function 

Consider the model (1.1), with following additional assumptions. 

fi'i(-) = 5'(Vo-i)M> i = l,...,n, (5.1) 

where the density g is symmetric and monotonically on ]R_|_. (cri)j is a sequence 
of real values such that for any z, < dmin < ai < oo where (Xmin is known. 
The model (1.1) with (5.1) can be written as 

ri = /(X,) + a,e., ^ = l,...,n, (5.2) 

where (^j) are i.i.d. with the density g. 
Let 

p^{z) = 7(z - 0.5 7) I|,|>^ + 0.5 z^ I|,|<^, z G M, 7 > 0. (5.3) 
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the Huber function (Huber [1964]). We construct the p^-LPA estimator 
from (2.3), (2.4) and (2.5). The function is a contrast function verifying 
Assumption 1. Recall that the constant c = c(p^, {gi)i) defined in (2.9) 
must be positive. We notice that the second derivative can be written as 
p''{-) = I[-7,7](-) and that 



c(P7' (S'OO > := 2 / g{z)dz. 

Jo 

We formulate the following assertion: for any (Tmin > and any g a symmetric 
density and monotonically on ]R_(., there exists a constant 70 > such that 
for any 7 > 70, > 0. 

We propose the adaptive p^g-LPA estimator /^^(xo) selected with the 
data-driven selection proposed in Section 4 with the constant 

The next result is a direct consequence of Theorem 2. 

Corollary 1. Let b > 0, M > be some fixed constants and consider the 
model (5.2). Then, for any {g,)i G g%]f , P e {0,b], L > and r > 1 

limsup0-^(/3)i?„,J/;(xo),erf(/3,L,M)j < 00. 

Remark 7. We notice that the threshold parameter C explicitly depends 
on the minoration (Tmin of the noises variances ((Jj)j. Contrary to linear 
estimators (C polynomially depends on {(J.j)i), we can see that the influence 
of (crj)j is very limited for p-LPA estimators. 

Remark 8. Corollary 1 only guarantees that asymptotically for any 7 > 70, 
p^-LPA estimators have the same performance. In the future, an important 
question to adress is: how one can choose the parameter 'y? In theory, there 
is yet no criterion for choosing an optimal 7, but we can make the following 
remarks. If j = 00, then the poo-LPA estimator is the least square estima- 
tor (sensitive to extreme values of the noise) and if •y = then the po-LPA 
estimator becomes the median estimator (robust estimator). It is well-known 
that least squares estimator and median estimator respectively suffer from un- 
dersmoothing and oversmoothing. This phenomenon is highlighted by Reiss, 
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Rozenholc, and Cuenod [2011]. We believe that a better choice of parameter 
7 should give a "semi-robust" estimator. Locally this could reduce the above 
mentioned issue. In practice, it will be interesting to select the parameter 7 
as a measurable function of observations which adapts to extreme values of 
the noise. This problem is related to the estimation of the noise variance and 
to the minimization of the Huber variance (cf. Huber and Ronchetti [2009] 
Page 74). 

6 Proofs of Main Results: Exponential in- 
equalies and Upper Bounds 

6.1 Proof of Proposition 1 

Notations. Recall that Sb = {q ^ : \q\ < b, \q\ = qi + ■ ■ ■ + qd} and 
Nb its cardinal. We consider the partial derivative of the local p-criterion 

d 



D,{.)=[ — U-) , (6.1) 
where 7f/j(-) is the local p-criterion defined in (2.4). Let also 



S,(-)=E 



Db 



and Dn{-)=¥.f^[bn{-)\, (6.2) 



where fo is the Taylor polynomial defined in (2.6), E/g = lEj^ be the mathe- 
matical expectation with respect to the probability law Pj^ of the "paramet- 
ric" observations (Xj, 3^i)i=i,...,„ (cf. (1.4)) and = be the mathematical 
expectation with respect to the probability law P/ of the observation Z'^"-'. 
We call the Jacobian matrix of such that 

(M-)l,.s,--=(^oi{.)) =(E/.a^ *.(•)) . (6.3) 

where -D^(-) is the p*'^ component of Dh{-). 



Auxiliary lemmas. We give the following lemma concerning the deter- 
ministic criterion Dh defined in (6.2). Denote || ■ II2 the £2-norm on M^*. 
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Lemma 1. Let p be a contrast function, for any {gi)i G Q^f* we have the 
following assertions: 

1. the matrix J_d(6') is strictly positive definite and 9 is the unique solution 
of the equation Dh{-) = (0, 0) on 6(M), 

2. there exists 6 > which only depends on the contrast function p such 
that for any 6 G B{6, 5), we have 



\e-9\\2 < — inf 



Dn{9)-Dh{e) 

Recall that bh = ?,wpx&v^^{h) \ fe{x) — f{x) \ corresponds to the approxima- 
tion error (bias) and denote S^^-) the p^^ component of £h{.-)- Let us give a 
lemma which allows us to control the bias term. 

Lemma 2. For any contrast function p, h > n~^/'^ and any f such that 
ll^lli ^ we have 

max sup \£-f^{t) — D^(t) \ < bh. 

The next result allows us to control deviations of partial derivatives of 
p-criterion Dh defined in (6.1). 

Lemma 3. For any contrast function p, any f such that < M and any 
h > n^^^'^, we have for any z 



> z 



maxP/ f sup Dl{t) - £l{t) 

{(^z — bh \fnhP^ 
"8/e(iv/3y+^(i77::j^ 

As we mentioned above, the partial derivatives (i)jj(-))^ can be considered 
as empirical processes. Thus the proof (given in Appendix) is based on a 
chaining argument and Bernstein's inequality (cf. (7.8)). In particular, it 
is required that the derivative p' of the contrast function is bounded and 
Lipschitz. 

Denote by the complementary of G\ (defined in (2.7)) where the radius 
(5 is defined in Lemma 1 and let = infige{Af)\B(6»,<5) ||-Dh(t)||2/2 be a positive 
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constant. The next lemma allows us to control the probability of the event 
that "the p-LPA estimator does not belong to the ball centered on 9 with 
radius 5". 

Lemma 4. For any contrast function p, f E M.d{P, L, M), 5 > and n 
such that 

Xs 



>2 sup {iVbhVnJ?), 

/ie[/lmin,/lmax] 



we have 



P/ < N.S exp 



8ir^(lVpy + 4^iroo(lVpo 



Proofs of those lemmas are given in Appendix. 

Proof of Proposition 1. Definitions of 9{h) and 9 = 9{f,xo,h) imply 
that for any e > ^ (l V bhVnJ?) 

Ff (v^|/'^(xo) - /(xo)| > e, Gj) < P; (v^|^o,...,o(/^) - ^o,....o| > s, G^) 

< ¥f(^V^^/N'b\\9{h)-0\\^>e,G'^y 

where || ■ II2 is the £2-norm on M^*". Under the event we have 9{h) G B{9, 6) 
for the specific choice of 6 given in Lemma 1 and depending on the contrast 
p. According to Lemma 1 (2) we obtain that 



¥f [V^\f\xo)-f{xo)\>e, a: 



> e 



Using Lemma 1 (1) and the definition of 9{h) in (2.5), reminding that 
Dh{9) = Dh[9{h)) = and using the well-known inequality || ■ II2 < y/Nb\\ ■ ||oo 
(where || ■ II2 and || ■ ||oo are respectively £2-iiorm and £oo-iiorm on M^*"), we 
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get with the last inequahty: 

Ff (v^|/"(a;o)-/(xo)| >£, G 



< Ff ( VnJ? 



UK 

cX 



< P/ ( VnJ? sup 
pes, \ *e0(Af) 



m) - Dm 



> e 



> 



2 

cA e 
2N. 



Applying Lemma 3 with z = |^ and the last inequality, finally we obtain 
the assertion of Proposition 1 

Ff (V^\f\xo)-fixo)\>e, G'^s 

(i^ - (1 V bhV^) 



< Nf,IJ exp 



6.2 Proof of Theorem 1 

Before starting Proofs of the main results of this paper, let us define 
auxiliary results. The next proposition provides us with upper bound for the 
risk of a p-LPA estimator. Put 



oo 



cA 

2_ cA 1 ' 

Nt 2 



X exp < — -^^ TT — } dz, r > 1. (6.4) 

Proposition 2. Let p be a contrast function. Then, for any n G N*, h > 

n~^/'^, xq G [0, l]'^ and any f such that \\9\\i < oo, we have 

E/|/'(xo) - f{xo)\X. < Cr (1 V bhV^^y {nh'^y''^', r > 1. 
The proof of Proposition 2 is deduced from Proposition 1 by integration. 
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Proof of Theorem 1 By definition of M.a{P, L, M), the approximation 
error (bias) bh as defined in (1.3) verified bh < Ldh^ for any h > 0. Moreover 
by definition oi h = (L^n)~2?+d in (3.5), we have that bj^ y/nh/^ < d and 
(n¥y^^^ = L^(^„(/3). We get 

(xo)-/(xo)r = Ef\f\xo)-f{xo)\\-n+Ej\f\xo)-f{xo)\%n. (6.5) 



The right hand side is controlled by Lemma 4. Indeed, we can use the 
Cauchy-Schwarz inequality, 

(xo)-/(xo)|%. 

< (Ef\f\xo)-fixo)\''Ff[G^Y' 

< (2MYJnJj exp <^ ; " / ^ \ . 6.6 

1 16i^^(lVp^) + ^i^oo(lVpoo)J ^ ^ 

The last inequality is obtained because M is a upper bound of / and /'^ (cf. 
Definition 1 and (2.3)). Using Proposition 2, (6.5) and (6.6), we obtain 

E/ifM-/(xo)r' 

< Crd' L^<(/3) 



+ {2M)WNbE exp 



16ir^(lVp^) + ^iroo(lVpe 

When n tends towards +oo. Theorem 1 is proved. 



6.3 Proof of Theorem 2 



We start the proof with formulating some auxiliary results whose proofs 
are given in Appendix. Define 



h* 



1 

2/3+ti 



L'^d'^ n 

where ^?^(/3) is defined in (4.2). Let k be an integer defined as follows: 

2''^h < h* < (6 

^ "-max ^ " ^ "-max- 

For any n large enough, we have /imin ^ h* < h^^.^. 



(6.7) 
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Lemma 5. For any f G IHI(i(/3, L, M), any n large enough and any k> n + 1 

¥f{k = k, G^*) < j2-2('=-iW, 
where J = NbS{l + (1 - 2-'^'''^)-^) . 

Proof of Theorem 2. This proof is based on the scheme due to Lepski, 
Mammen, and Spokoiny [1997]. The definition of h* (6.7) and k (6.8) imphes 
that for any n large enough 

(l V hh^^Jnhi^ < 2Vl + A;ln2, \/k > k. (6.9) 

Using Proposition 2, the last inequahty yields 

E/l/^'Ha^o) - f{x,)Wn, < Cr S:ik), yk > K. (6.10) 

To get this result we have apphed Proposition 2 with h = hk and (6.9). We 
also have 

E/i/^'H^o)-/(xo)r 

= E;|/W(xo) - f{xo)\\^^^H + E/|/(^Hxo) - f{xo)\\^^^H 

■.= R,{f) + R2{f) + R;{f).' (6.11) 

First we control Ri. By convexity of | • r > 1 and with the triangular 
inequality, we have 

|/W(xo) - /(a;o)r < r-'\f'\xo) - f ''\xo)\'' + T-'\f'^^\xo) - f{xo)\'. 

The definition of k in (4.5) yields 

where the constant C is defined in (4.6). In view of (6.10), the definitions of 
Hk lead to 

E/l/^'^^Xo) - f{Xo)\\^^^^., < Or S:{^), 
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where is defined in (6.4). Noting that the right hand side of the obtained 
inequahty is independent of / and taking into account the definition of k and 
h* we obtain 

hmsup sup < oo. (6.12) 

n-5>oo f<md{P,L,A,M) 

Now, let us bounded from above i?2- Applying Cauchy- Schwartz inequality, 
in view of Lemma 5 we have for n large enough 

i?2(/) = ^E^|/('=)(xo)-/(xo)rV, 



k=K 



k>K 

1 /2 



-{k-l)rd 



We obtain from (6.10) and the last inequality 

(^^max) s>0 



It remains to note that the right hand side of the last inequality is indepen- 
dent of /. Thus, we have 

limsup sup 0~^(/3)i?2(/) < oo. (6.13) 

n->oo /6Hd(/3,L,A,M) 

It remains to bound Rsif)- By definition, note that |/*-'^-*(a;o)| < M, this 
allows us to state that [/'■'^^(xq) — /(a^o)] ^ 2M. Finally we obtain 

i?3(/) <2'-M^^P/{G''^}. 

Since nh'^^^ = (inn)^'^, then 

limsup sup 0-'^(/3)i?3(/) < cx), (6.14) 

n^oo feMa(l3,L,A,M) 

follows now from Lemma 4. Theorem 2 is proved from (6.11), (6.12), (6.13) 
and (6.14). I 
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7 Appendix 

7.1 Proof of Lemma 1 

1. By definition of the Jacobian matrix Jd(") in (6.3), we can write for 
any p,qeSb 

•JDiO)] = - E / K{x) [ p"{z - f-,_,{y + hx)) gi{z)dz dx. 

I Jp,g ~7 J [-1/2,1/2]'* JR 

Applying this formula when 6 = 6, the term fg_g vanishes, so: 

J^(^) = 1 f p'\z) gi{z)dz [ U{x) U^{x) K{x)dx. 

where f/(-) is defined in (2.1). Since {gi)i G G^f^, the definition of c in (2.9) 
implies that ^ IrP'^^) 9i{z) dz > c> 0. 

Now we show that Jd{6) is a strictly positive definite matrix, indeed for 
any r G M^^\0 



(x) K{x) dx T 



t^M6)t = -Y, I P"{^)9^{z)dzT'' f U{x)U 

= f P"(')9^{z)dz [ [r^U{x)YK{x)dx 

> cj [T''U{x)fK{x)dx>^. (7.1) 

Let us show that for any h > n~^^^, 6 is the unique solution of -D/i(-) = 
(0, . . . , 0). By definition in (6.2), Dh can be written as 



L)^(t) = --^ / x^K{x) I p{z- ft-^e{y + hx)) gi{z) dz dx. 

^ i^l ^[-0.5,0.5]'* Jr 



(7.2) 



Moreover, we have that 



D,it) = (0, . . . , 0) ^ J] (t, - 6,)Dl{t) = 0. 

P&Sb 
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Denote u{-) = ft-e{y + h-). Since for any i, gi is monotonically on ]R_|_ and 
symmetric, then 

e [-0.5,0.5]^ inf inf gi(z-\u{x)\)-gi(z+\u{x)\)>0. (7.3) 

z>Oi=l,...,n 

Since {gi)i are symmetric, K is positive and p' is odd and positive on M.*_^, 
the last inequahty and (7.2) imply 

[ u{x)K{x) [ p'{z- u{x)) ^i=^^'^''^ dzdx = 

J [-0.5,0.5]'' Jr 

■x^ / u{x)K{x) / p'(2;) 'S^gi[z + u{x)) dz dx = 

J [-0.5,0.5]'^ Jr j^]^ 

//•OO _ 
u{x)K{x) I p'{z) gji^z — u{x)) — gji^z + u{x)) dz dx = 
,-0.5,0.5]'' Jo 

//•OO ^ 
|M(x)|i^'(a;) / p'(z) — |n(a;)|) — (^^(z + |n(a;)|) (iz 

,-0.5,0.5]'' Jo 

n 

^ Vx G [-0.5,0.5]'^, ;z > 0, ^ gi{z - \u{x)\) ~ gi[z + \u{x)\) = Q 



i=l 



Assume that there exists xq G [—0.5, O.S]'' such that u{xq) ^ 0. In particular 
for any z, since gi is monotonically on ]R_|_, there exists Zi^xQ > such that 

gi{zi^xo - \u{xo)\) - gi{zi^xo + \uixo)\) > 0. 

That leads a contradiction in view of (7.3) and (7.4), thus for any x G 
[—0.5,0.5]'^, we have u{x) = 0. By definition of u(-), we get 

Vx G [-0.5, 0.5]^ h > n-i/^ \ft-g{y + hx) \ = 0^t = e. 

Then, 9 is the unique solution of Dh{-) = 0. 

2. Let III ■ III2 be the euclidian matrix norm, Amax(^) the spectral ray of 
the matrix A and Xo{A) the smallest eigenvalue of A. According to Lemma 
1(1), there exists a radius 6 > 0, which only depends of p such that 

inf Ao (Jnie)) > Ao (Jd(^^)) /2 > 0. (7.5) 



9eB{e,5) 
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This assertion can be explained as follows. In view of Assumption 1 (3), 
^oiJoi')) is a continuous function. So, there exists a radius 6 > expected 
such that (7.5) is true. 

According to the local inverse function theorem, we can deduced that for 
any 6 e 5(6,6), 

\\\JD-^m\2 = mn'mU = Xm..{Jn\0)) = 1/Ao(JdW), (7.6) 

By definition of Qp^\ we have for any {gi)i G Qp'^\ c = c{p, {gi)i) and according 
to (7.1) A = Ao(Jd(6')) > 0. The smallest eigenvalue of Jd{6) is bigger than 
cA > 0. Indeed we have 



JD{d) = -j^f p"{z)g,{z)dz [ 

'^~l^M ^[-1/2,1/2]' 



U{x) U^{x) K{x) dx. 



and 
A 



^{JD{e)) = - V / p"{z)g,{z)dz \Jf 

^—[Jm. \J[-1/2,1/2]' 



U{x) U^{x) K{x) dx 



> cA. 



By definition of 6 in (7.5), using (7.6) and the last inequality, we have for 
any 6 G 6(9, 6) 

\\\jD-m\\\2 < (T.7) 

As Dh is differentiable and each partial derivative is continuous (cf. Assump- 
tion 1 (3)), we use the local inverse function theorem and (7.7) which give 
for any 6 G B{6, S) the following inequality 

2 



\e-9\ 



< 

2 cA 



7.2 Proof of Lemma 2 

By definition of and in (6.2), we have for any t G 0(M) 



1 " r 



X — Xq 



h 



K 



X — Xq 
h 



i=l " [0-1]° 

X / \(^{z^ j{x) - jt{x)) - f){z - jt^Q{x))\ gi{z)dzdx. 
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Since p' is 1-Lipschitz (cf. Assumption 1 {2)) and J K = 1, then with the 
last inequahty, it yields 

V/i>n-^/^ max sup \S^^{t) - Dl{t)\ < bh- 
pe-^b tee{M) 



7.3 Proof of Lemma 3 

Bernstein's Inequality. To prove this lemma, we use the following well- 
known Bernstein's inequality which can be found in Massart [2007] (Section 
2.2.3, Proposition 2.9). Let Xi,...,Xn be independent square integrable 
random variables such that for some nonnegative constant A'oo, Xi < X^o 
almost surely for alH = 1, ■ ■ ■ ,n. Then for any positive e, we have 



(7.8) 



where E = E" is the mathematical expectation with respect to the probability 
law P of A*!, . . . , Xn- The latter inequality is so-called Bernstein's inequality. 



Proof of Lemma 3. We have for any p & Si 

sup mt)-Diit) 

In view of Lemma 2, we get 



< sup 



Dm -Slit) + sup \sm-Dm\ 

tee{M) 



sup 

tee(M) 



m) - Di{t) 



< sup 

tee(A/) 



Dl{t)-8l{t) 



+ bh. 



(7.9) 



Set L{-) = D^{-) — Sf^{-). To establish the assertion of the lemma, we use a 
chaining argument on L{-). Remember that 0(M) is a compact of M^*" with 
£i-norm. Let to G 0(M) be fixed and for any / G N* put Ti a 10~'-net on 
0(M). We introduce the following notations 



Uo{t) = to, Ui{t) = arg inf \\u — t\\i, I E N* 
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Since p' is continuous, L(-) is stochastically continuous which allows us to 
use the following chaining argument 

oo 

L(t) = L(to) + 5^Iv(^xKt))-i^Ki(t)), VtG0(M). (7.10) 

i=\ 

Using (7.9) and (7.10), we obtain 



P/ Vn/^ sup bl{t) - Dl{t) 
tee{M) 



> z 



<Fj sup \L{t)\ > — =-6^ 



tee{M) 



<Fji\L{to)\+ sup y2\L{ui{t)) -L{ui_^{t))\> 
\ te0{M) 

We can control the second term as follows. 



hh ■ (7.11) 



sup V|L(uKt)) -iv(n,„i(t))| < V sup \L{u)-L{v)\ 

|ju-D||i<10-' 

where Fq = {to}. Using (7.11) and last inequality, we get 



Ff\Vnf? sup \L{t)\ > z - bh Vnf? 
tee{M) 



< ¥f i^V^ \L{to)\ > z/2 - hh Vn}?/2 



/ 



Vnf?^ sup \L{u) - L{v)\ > z/2 -bhVn}?/2 
1=1 



\ ||m-i>||i<10-' 

By Definition of Df^ in (6.1), we can write: 

i=l ^ 



. (7.12) 



Xi — Xo\'^ f Xi — Xq 



We define the function y\;t(x, 2) = p' {z + f (x) - ft{x)) K {^) 

for all X G [0, 1]'^ and 2; G M. Since for any i, Yi = f{Xi) + ^j, the process 
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\/nh^ L{-) can be written as an empirical process (sum of independent, zero- 
mean and bounded random variables). 

n 

v^L(t) = Y,^t{Xi,^,) -EfWt{Xi,^,), t e Q{M) (7.13) 
1=1 

At a fixed point to, we can use classical exponential inequalities for empirical 
process. By definition of Wti-, ■) above, we have 

n 

J2^fW^{Xi,^,) < pI Kl, \\Wt{., .)||oo < Poo i^oo/v^, (7.14) 

i=l 

where || ■ ||oo is the sup-norm. 

For the control of the first probability of (7.12), we use the Bernstein's 
inequality (7.8), then 



Ff ( VnJ? 



< 2 exp 



> 



8plKl + '-^{z-bh v^) 



(7.15) 



The second probability can be bounded as follows: 
/ 



vnh'^y sup \L{u) — L[v)\ > 



v 



||it— d||i<10~' 



/—-; ^ 1 ,2 I r / \ r / \ I ^ Vnh/^ 

y nh"- y — sup/ sup — L[v)\ > 



v 



1=1 



1>1 u,u6r;xrj_ 
|l«-t)|li<10- 

^2 



J2 Fj -I' \L{u) - L{v) I > - - ^h— 



1 = 1 tl.ugr; Xr;_ J 



.(7.16) 



In view of (7.13), we notice that 

n 



i=l 
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then we have a sum of independent zero-mean random variables with finite 
variance and bounded. Since p' is assumed Lipschitz, we have the following 
assertions. 



\u — v\ 



15 



1=1 



|>V„(.,.)-W.(.,.)||oo < K^\\u~v\\i/Vnhd 



Using (7.16), the Bernstein's inequality (7.8) and the last three inequalities, 
we obtain 



< 



y/nh^'^^ sup |L(u) — L(t>)| > ^ 
36||m — v\\i^ 



\ 



l^l u.fer;xr,_i 
||«-d||i<10-' 



exp 



= 1 «,t,er,xr,_i 
||M-i)j|i<10-' 



7r4 /4 



X 



8KI \\u-v\\i + ^^ 



z — bh Vnh/^^ 



<2 5^#(r,)#(r,_o exp 



3610' 



z — bh V nh'^ 



7r4 /4 



(7.17) 



where ij^iTi) is the cardinal of Ti. Moreover, we notice that < dlOK 

Recall that z > 2(l V bh y/nh^) and we notice that min — :rTT- > 1. The last 
assertions allows us to write that 



exp 



< 



36 10' 



(^z — bh V nh^ 



exp 



18 10' m^)-^ 



TT^ Z4 + 1/3 



X exp 



z — b^y/ nh'^ 



8KI 



z — bfi y/nh/^ 
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Using (7.11), (7.12), (7.15), (7.17) and the last inequality, we have for any 
p eSb 



Ff I sup Dl{t) - £l{t) 

tee{M) 



> z 



< Z'exp 



z — bh y/nh^^ 



where E is defined in (2.8). This concludes the proof of Lemma 3. 



7.4 Proof of Lemma 4 

Remember that the event (5^ can be written as = {0(h) ^ B{9,6)^ 
and 9{h) and 9 are respectively the solutions of equations -D/i(-) = and 
Dhi') = 0. Moreover 9 is the unique solution of Dh{-) = 0, then we can 
notice the following inclusion 

{9{h)^B{9,6)}c\ sup \\Dh{t)'Df,{t)h>^5\, (7.18) 
[tee(A/)\B(e,<5) J 

where xs = inf \\Dh(t)\\2/2. The latter inclusion can be interpreted 

te0(A/)\S{0,<5) 

as follows. In view of Lemma 1 (1), 9 is the unique solution of Dh{-) = 
thus Dh{-) is not null on Q{M)\B{9, S). Moreover, -D/i(-) does not depend of 
n, then ks is positive and does not depend on n. The event {9{h) ^ B{9, 5)} 
implies that there exists 9 G Q{M)\B{9,S) such that Dhi9) = 0, then on 
a neighborhood of 9, Dh{-) and Dh{-) are not closed. So, there exists 9 G 
e{M)\B{9,S) such that 

\\DH{9)-Dh{9)h>>cs. 

Then, the latter inequality implies (7.18) by passing to the supremum. 
Applying the inclusion (7.18), we obtain 



pes, 



b 



tee{M)\B{e,s) ' ■ y/Nb 
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Assumptions on n, h in Lemma 4 allow us to show that a/ nh^xs / ^/N~h ^ 
2(1 V hh y/nh/^y Using Lemma 3 with z = \/ nh'^>cs/\^Nb, we have 



P/ (G^) < N,E exp 
The lemma is proved. 



8ir^(lVp^) + ^iro,(lVpoo) 



7.5 Proof of Lemma 5 

Note that by definition of k in (4.5) 
\Jk>K + l, {k = k] = VJi^,\\f^^-^\x^) - p\x^)\> C S^{1)] . 
Note that S'„(/) is monotonically increasing in / and, therefore, 
[k = k] C ||/(^-i)(xo)-/(xo)| >2-iC5„(A;-l) 

U [u,>fe ||/(')(xo) - /(xo)| > Sn{l) 
We come to the following inequality: for any k> n + 1 
¥(k = k,Gf') < ¥\\f^''-^\x,)- f{x^)\>2-^C Sn{k-l\Gf] 

+ 5^ P 1 1 /« (a;o) - /(xo) I > S^l), G^ } . (7.19) 

l>k 

1 /2 

Notice that the definition of Sn{l) yields Sn{l) = [l + Zln2] . Thus, 

applying Proposition 1 with e = C [l + Hn 2] and h = hi and using the 
inequality (6.9), we obtain by definition of C in (4.6), for any I > k — 1 and 
n large enough 



f[\p\xo)-fixo)\>2-'CSnil)\ < N,ET 
We obtain from (7.19) and (7.20) that k>K + l 

^{k = A;,G^-) < j2-2(fc-iW, 
where J = NbE{l + (1 - 2'^'"'^)-^). 



2rdl 



(7.20) 
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